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Abstract — In this report, we introduce the hypothesis testing 
problem of determining, among n random variables, k random 
variables which have different probability distributions from the 
rest (n — k) random variables. For this purpose, instead of using 
separate observations for each random variable, we propose to 
use mixed observations which are functions of multiple random 

k los ( ti ) 

variables. It is demonstrated that 0( -^77- — =-7-) obser- 

maip^Pj G(Pi,Pj) 
vations are sufficient for correctly identifying the k anomalous 
random variables with high probability, where C(Pi,Pj) is the 
Chernoff information between two possible distributions Pi and 
Pj for the proposed mixed observations. This can potentially 
lead to significant reductions on sensing resources in certain 
applications. 

I. Introduction 

Mathematically, we consider n independent random vari- 
ables X\, X2,---, X n . Out of these n random variables, (n — k) 
of them follow a distribution /i(-); while the rest k random 
variables follow another distribution /a(-)> where k <C n. 
However, it is unknown which k random variables follow the 
distribution /a(-)- Our objective is then to identify these k 
anomalous random variables. This problem has applications 
in anomaly detections, for example, quickest detections of 
potential hazards or changes (TJ, 0. 

One natural way to find out the k anomalous random vari- 
ables is to do one-by-one hypothesis testing for these n random 
variables. One can get I samples for each random variable Xi, 
1 < i < n, and then use existing hypothesis testing techniques 
to determine whether Xi follows the probability distribution 
/i(-) or /2(-)- Thus, to ensure correctly identifying the k 
anomalous random variables with high probability, at least 
0(n) samples are needed for one-by-one hypothesis testing. 
This inevitably creates an enormous burden on data collecting 
and processing, especially when n is astonishingly large in 
certain applications. 

In this paper, since < n, borrowing ideas from com- 
pressed sensing 0, 0, 0, we propose to use non-adaptive 
mixed sampling of n random variables instead of one-by-one 
sampling of these n random variables. The idea is to make 
each sample a function of n random variables, instead of a 
realization of an individual random variables. Our preliminary 
analysis has shown that there is an advantage of performing 
this method. The number of samples needed to correctly 
identify the k anomalous random variables can be reduced to 
0(k log(n)). To the best of our knowledge, the closest results 



to our report are 0, 0, 0, where adaptive observations 
for individual random variables, instead of non-adaptive mixed 
observations in this paper, are used. It is noted that the overall 
number of observations for individual samplings of random 
variables is at least 6(n) 0, 0, 0. 

II. Mathematical Models 

We consider n i.i.d. random variables X\, Xi,..., X n . 
Suppose we take m mixed observations of the n random vari- 
ables. Each sample we take, denoted by gj(X\,Xi, ...,X n ), 
1 < j < m, is a function of n random variables X\, 
Xi,..., X n . In this paper, we only consider the case when 
the functions gj are linear, and we further assume that the 
functions gj, 1 < j < m, are the same for each 1 < j < m, 
which we simply denote by g(-). Thus for 1 < j < m, the j- 
th measurement Yj — g(Xi,X2, X n ) — a iXi, where 

Oj is a real number. In our setting, each random variable Xi 
takes an independent value from previous realizations of Xi, 
while in the now-well-known compressed sensing problem, for 
each measurement, the i-th interested element is a determined 
value and basically takes the same value in each measurement. 
So roughly speaking, our problem is a probabilistic version of 
the compressed sensing problem. 

Among the n random variables, (n — k) random variables 
follow a known probability distribution /i(-) and k random 
variables follow another known probability distribution /2(-)- 
In next section, we will describe an algorithm which identifies 
the k anomalous random variables using m linear mixed 
samplings. 

III. Algorithm 

We note that there are L — (™) possible probability 
distributions for the output of the function g(-), depending 
on which k random variables are anomalous. We denote these 
possible probability distributions as Pi, P2, Pl- Our simple 
algorithm is to find the true distribution by doing pairwise 
Neyman-Pearson hypothesis testing f9) of these L probability 
distributions. We denote the m observations by Y\, Y2, Y m . 

IV. Number of Samples 

In Algorithm [TJ for two probability distributions Pi and 
Pj, we choose the probability likelihood ratio threshold of 
the Neyman-Pearson testing in such a way that the error 



Data: observation data Y\, Y 2 , Y m 
Result: k anomalous random variables 

• For all pairs of distinct probability distributions Pi and 
Pj (1 5= i,j < L an d i i)> perform Neyman-Pearson 
testing for two hypothesis: 

- Yi, Y 2 , Y m follow probability distribution Pi 

- Yi, Y 2 , Y m follow probability distribution Pj 

• if there exists a certain j*, Pj» is the winning 
probability distribution whenever it is involved in a 
pairwise hypothesis testing, then 

declare the k random variables producing Pj* as 
anomalous random variables; 
else 

declare a failure in finding the k anomalous random 

variables, 
end 

Algorithm 1: Hypothesis Testing from Mixed Observations 



probability decreases with the largest possible error exponent, 
namely the Chernoff information between Pi and Pj 



C(Pi,Pi) 



min log 

0<A<1 



P*{x)P]- x {x)dx 



So overall, the smallest possible error exponent of making 
a error between any pair of probability distributions is 

min C(Pi,Pj). 

l<i, j<L,i=£j 

Without loss of generality, we assume that P\ is the true 
probability distribution for the observation data Y. Since the 
error probability P e in the Neyman-Pearson testing scales like 
P e = 2- mC ( p ^~> < 2- mE , by a union bound over the L - 1 
possible pairs (Pi, Pj), the probability that Pi is not correctly 
identified as the true probability distribution scales at most 
as L2~ mE . So 9(fclog(n)P~ 1 ) samplings are enough for 
identifying the k anomalous samples with high probability. 
When E grows polynomially with n, this implies a significant 
reduction in the number of samples needed. 

V. Conclusion 

In this report, we have introduced the problem of finding k 
anomalous random variables following a different probability 
distribution among n random variables, by using non-adaptive 
mixed observations of these n random variables. Preliminary 
analysis has shown that mixed observations, compared with 
individual observations of random variables, can significantly 
reduce the number of samplings needed to identify the anoma- 
lous random variables. Compared with general compressed 
sensing problems, in our setting, each random variable may 
take different realizations in different observations. 

There are numerous future directions to pursue in compar- 
ing mixed observations of random variables over individual 
observations of random variables. Here is an incomplete list 
of them. 

• What are the results when the n random variables are cor- 
related random variables instead of independent random 



variables? 

• What are the optimal constructions minimizing the num- 
ber of mixed observations while identifying the k anoma- 
lous random variables with high probability? 

• Do nonlinear mixed observations offer advantages over 
linear mixed observations? 

• What advantages can we gain when we allow the linear 
mixing to vary across different observations? 

• What efficient algorithms are available to identify the 
k anomalous random variables, using the probabilistic 
mixed observations? 

• What are the information-theoretic lower and upper 
bounds on the number of samplings needed for identi- 
fying the anomalous random variables, especially when 
there are constraints on allowable mixed observations? 

• Under constraints on allowable mixed observations, how 
does the minimum Chernoff information E scales as a 
function of k and n? 

• How are the hypothesis testing results from mixed obser- 
vations affected by observation noises and errors? 

• What if the k anomalous random variables are allowed 
to have different distributions? Can we further distinguish 
which distribution belongs to which random variable? We 
note that when we allow anomalous random variables 
to have different distributions, the traditional compressed 
sensing problem is just a special case of this hypothesis 
testing problem from mixed observations. 
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